Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information
نویسندگان
چکیده
We describe a method to detect common Chinese characters between Japanese and Chinese automatically by means of freely available resources and verify the effectiveness of the detecting method. We use a joint phrase alignment model on dependency trees and report results of experiments aimed at improving the alignment quality between Japanese and Chinese by incorporating the common Chinese characters information detected by proposed detecting method into the alignment model. Experimental results of JapaneseChinese phrase alignment show that our approach could achieve 0.73 points lower AER than the baseline system.
منابع مشابه
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters
Common Chinese characters between Japanese and Chinese have been proved to be effective in Japanese-Chinese phrase alignment. Besides common Chinese characters, Japanese and Chinese also share many other semantically equivalent Chinese characters. However, there are no available resources for this kind of Chinese characters. In this paper, we propose a statistical method aiming to detect these ...
متن کاملExploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation
Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for ChineseJapanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter b...
متن کاملChinese-Japanese Cross Language Information Retrieval: A Han Character Based Approach
In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the mulfilingual collection in a uniform manner. We discu...
متن کاملChinese-Japanese Clause Alignment
Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including...
متن کاملUniform Indexing and Retrieval Scheme for Chinese, Japanese, and Korean
This paper reports on our work at the third NTCIR workshop on the subtasks of Chinese, Japanese, and Korean monolingual information retrieval (IR). A Chinese IR system is applied to all document sets in these three languages. Based on the n-gram indexing model and a phrase formulation method to extract longer key terms for indexing, no language-dependent modifications were made to apply the sys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011